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Abstract 

This  position  paper  discusses  some  of  the  difficulties 
posed  by  developing  user  interfaces  for  mobile  aug¬ 
mented  reality  systems.  We  argue  that  these  are  a 
superset  of  the  challenges  which  are  encountered  by 
mobile  computers  which  use  2D  textual  displays  and 
we  discuss  the  potential  role  which  could  be  played 
by  artificial  intelligence  methods. 

1  Introduction 

Mobile  computers  will  dramatically  change  the  way 
in  which  information  is  delivered  to  individuals. 
Through  the  use  of  laptop  computers,  personal  dig¬ 
ital  assistants  and  mobile  telephones,  it  is  possible 
to  read  email  and  even  surf  the  web.  However,  the 
power  of  mobile  computing  is  that  the  information 
which  is  being  displayed  can  be  tailored  to  the  user’s 
current  tasks  and  contexts.  Relatively  unobtrusive 
information  delivery  systems,  such  as  the  Wearable 
Remembrance  Agent  [10],  have  been  developed. 

Recent  development  in  portable  computing  hard¬ 
ware,  position  and  orientation  trackers,  and  see- 
through  displays  have  begun  to  make  mobile  aug¬ 
mented  reality  systems  feasible.  Augmented  reality 
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(AR)  integrates  virtual  information  with  the  user’s 
physical  environment.  Graphics-based  AR  can  pro¬ 
vide  a  user  with  a  “heads  up  display”  in  which  com¬ 
puter  graphics  is  spatially  registered  with,  and  over¬ 
laid  on,  geographic  locations  and  real  objects. 

Mobile  AR  becomes  a  superset  of  conventional 
wearable  computers.  However,  AR  offers  a  funda¬ 
mentally  different  way  of  displaying  the  information 
and  providing  the  users  with  tools  to  interact  with 
that  information.  Rather  than  just  provide  2D  tex¬ 
tual  displays,  a  user  sees  information  and  is  able  to  di¬ 
rectly  interact  with  it  within  the  user’s  own  3D  space. 
The  extra  dimension  means  that  augmented  reality 
has  the  potential  to  be  much  more  valuable  and  much 
more  complicated  than  traditional  wearable  systems. 
However,  with  this  extra  freedom  comes  increased 
complexities.  A  display  which  is  heavily  cluttered  is 
unreadable.  Poorly  positioned  labels  or  annotations 
can  be  confusing  or  highly  misleading. 

In  this  position  paper  we  argue  that  Artificial  In¬ 
telligence  (AI)  techniques  have  the  potential  to  play 
a  significant  role  in  developing  intuitive  interfaces 
which  will  help  to  mitigate  or  overcome  some  of  the 
difficulties  involved.  We  survey  some  of  the  technol¬ 
ogy  and  approaches  which  have  been  used,  and  iden¬ 
tify  key  areas  where  we  feel  developments  are  needed. 
The  structure  of  this  paper  is  as  follows.  The  next 
section  discusses  the  application  scenario  in  more  de¬ 
tail.  Section  3  discusses  the  problem  of  generating 
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and  managing  a  graphical  display  to  minimize  the 
problems  of  clutter  and  information  overload.  Sec¬ 
tion  4  discusses  the  issues  of  how  a  user  can  issue 
instructions  and  commands  to  a  system.  We  summa¬ 
rize  and  conclude  in  Section  5. 


2  Application  Scenario 


Our  goal  is  to  develop  software  systems  and  interac¬ 
tion  techniques  to  support  multiple,  mobile,  collab¬ 
orating  users  with  wearable  AR  systems  [6].  These 
users  would  interact  with  other  users  of  stationary 
VR,  AR,  and  desktop  systems.  To  this  end,  sev¬ 
eral  systems  have  been  developed.  One  such  sys¬ 
tem  is  shown  in  Figure  1  shows  the  Battlefield  Aug¬ 
mented  Reality  System  (BARS)  which  is  currently 
under  development  at  NRL  in  collaboration  with 
Columbia  University.  Mostly  built  from  common 
off  the  shelf  (COTS)  products,  the  system  is  com¬ 
posed  of  6DOF  trackers  (an  Ashtech  GG  Surveyor 
real-time-kinematic  GPS  for  position,  an  InterSense 
IS300Pro  for  orientation),  a  see-through  head- worn 
display  (Sony  LDI-D100B  Glasstron),  a  wireless  net¬ 
work  and  a  wearable  computer  with  3D  hardware 
graphics  acceleration.  The  purpose  of  the  system  is 
to  provide  a  user  with  situation  awareness  —  given 
a  set  of  tasks  to  complete  in  an  urban  environment, 
the  system  must  provide  the  user  with  information 
which  is  pertinent  to  that  task.  The  types  of  infor¬ 
mation  which  can  be  displayed  include  the  names  of 
buildings,  routes  which  have  to  be  followed,  objec¬ 
tives  which  have  to  be  achieved  and  the  locations  of 
other  users.  As  an  example,  Figure  2  shows  an  actual 
image  taken  through  the  see  through  head  mounted 
display. 

The  display  in  Figure  2  appears  to  be  relatively 
straightforward.  However,  if  the  display  becomes 
much  more  complicated,  significant  improvements  in 
systems  technology  will  be  required.  We  focus  on 
two  such  issues  —  display  management  through  in¬ 
formation  filtering  and  intuitive  3D  user  interaction 
paradigms. 


Figure  1:  Prototype  mobile  augmented  reality  sys¬ 
tem.  This  system  is  constructed  from  COTS  prod¬ 
ucts. 


Figure  2:  Sample  output  from  the  prototype  aug¬ 
mented  reality  system.  In  this  scenario,  a  user  fol¬ 
lows  a  route  (triangles)  around  the  edge  of  a  building. 
Sniper  is  visible  beyond. 
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Figure  3:  Sample  output  from  the  prototype  aug¬ 
mented  reality  system  when  all  information  about  the 
system  is  displayed. 

3  Managing  the  Displays 

Arguably  one  of  the  most  fundamental  problems  with 
an  AR  system  is  the  potential  for  information  over¬ 
load.  In  a  dense  environment,  such  as  a  city,  there  can 
be  a  substantial  amount  of  information  which  can  be 
available.  However,  naively  displaying  all  available 
information  can  be  highly  confusing.  This  is  illus¬ 
trated  in  Figure  3  which  shows  the  output  from  the 
system,  rendered  in  the  same  view  as  Figure  2,  with 
all  data  from  the  database  is  displayed.  As  can  be 
seen,  the  result  is  extremely  confusing  and  is  difficult 
to  interpret.  Initial  thinking  suggests  that  a  simple 
line-of-sight  analysis  might  be  sufficient  to  overcome 
this  difficulty.  One  pragmatic  approach,  which  was 
utilized  in  the  ARQuake  system,  is  to  build  a  3D 
model  of  the  environment  and  make  it  black  so  that 
occluded  objects  are  not  visible  [12].  Although  this 
approach  is  suitable  for  games  it  is  not  suitable  for 
information  systems  where  the  ability  for  the  system 
to  provide  “X-ray  vision”  and  let  the  user  see  through 
obstructions  is  extremely  important.  Figure  2  shows 
that,  if  a  user  follows  a  route,  they  will  become  visible 
to  a  sniper  as  they  walk  around  a  building. 

To  overcome  these  difficulties,  it  is  necessary  to 
apply  some  form  of  autonomous  information  filter¬ 


ing.  This  technique  attempts,  from  a  user’s  current 
context  and  task  set,  to  choose  the  most  appropriate 
subset  of  available  information.  However,  the  method 
of  choosing  the  information  is  a  function  of  the  infor¬ 
mation  requirements  for  any  particular  task.  In  [7] 
and  [11]  we  presented  a  two-step  framework  for  in¬ 
formation  filtering.  The  first  stage  employs  the  spa¬ 
tial  model  of  interaction  [1] :  a  user  is  surrounded  by 
a  focus  region  and  all  objects  are  surrounded  by  a 
nimbus  region.  If  an  object’s  nimbus  intersects  with 
the  user’s  focus,  the  object  is  a  potential  candidate  to 
be  shown.  The  second  step  uses  task-dependent  logic 
to  cull  the  list  of  objects  to  a  critical  subset  to  be 
shown.  An  approach  based  on  dot  products  to  score 
the  relevance  of  an  object  with  respect  to  a  task  was 
developed. 

Although  this  algorithm  dramatically  reduces  the 
clutter  in  a  display,  (compares  Figure  2  and  3),  it  has 
a  number  of  important  shortcomings.  The  most  im¬ 
portant  of  these  is  that  the  current  implementation  is 
very  limited  in  its  capability  to  encode  both  the  con¬ 
text  (according  to  a  user’s  task  vector)  and  domain 
knowledge  (a  set  of  parameterized  functions).  Bet¬ 
ter  context  management  can  be  achieved  through  the 
use  of  more  sophisticated  sensors  and  processing  al¬ 
gorithms  to  detect  context.  For  example,  Golding  [4] 
applied  simple  machine  learning  techniques  (such  as 
Bayesian  nets)  to  deduce  the  context  of  a  user  from 
a  range  of  sensors. 

4  User  Interfaces 

The  mobile  outdoor  system  is  designed  to  aid  a  user 
in  completing  a  task.  It  must  provide  information 
to  the  user,  and  vice-versa,  without  distracting  the 
user  from  that  task.  We  feel  the  system  can  pro¬ 
vide  the  best  interface  by  monitoring  many  sources 
of  data  about  the  user  and  using  intelligent  heuris¬ 
tics  to  combine  that  data  with  information  about  the 
environment  and  task  to  produce  a  highly  usable  in¬ 
terface. 

The  system  contains  a  detailed  physical  model  of 
objects  in  the  real  environment  that  is  used  to  gen¬ 
erate  the  registered  graphical  overlay.  This  model  is 
stored  in  a  shared  database  that  also  contains  infor- 
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mation  about  the  objects  such  as  a  general  descrip¬ 
tion,  threat  classification,  and  so  on.  Using  knowl¬ 
edge  representation  and  reasoning  techniques,  we  can 
also  store  in  this  database  information  about  the  ob¬ 
jects’  relevance  to  each  other  and  to  the  user’s  task. 

We  believe  that  early  uses  of  BARS  for  situational 
awareness  will  mainly  consist  of  users  picking  objects 
in  the  environment,  either  to  find  out  more  about 
them  (“Where  is  the  electrical  cut  off  switch?”)  or 
to  add  information  about  them  (“I  saw  a  sniper  on 
the  third  floor  of  that  building”).  Thus,  we  need  to 
find  a  way  to  let  the  user  easily  pick  items  in  the 
environment. 

Using  tracking  sensors,  we  can  measure  the  user’s 
position  and  head  orientation  in  the  environment, 
and  use  that  information  to  determine  a  reasonable 
approximation  of  the  user’s  gaze  direction-though  we 
are  looking  at  eye  trackers  to  make  this  measurement 
more  accurate,  such  as  described  in  [13].  We  can  also 
track  the  user’s  hands  relative  to  the  body  position. 
This  information  gives  us  rays  of  gaze  and  gesture 
that  beam  through  the  3D  model. 

Once  we  have  the  rays  of  gaze  and  gesture,  we  have 
a  set  of  possible  objects  at  which  the  user  may  be 
looking  and  pointing  that  sit  along  and  near  these 
rays.  This  set  will  usually  be  larger  than  a  single 
object  for  each  mode  because  the  our  models  are  of 
dense  urban  terrain  and  many  objects  will  sit  along  or 
near  these  rays.  Also,  because  this  is  an  AR  system, 
even  if  an  object  is  occluded  in  the  real  world,  it  may 
be  visible  on  the  AR  display,  so  that  a  user  may  pick 
an  object  whose  physical  counterpart  is  not  visible  at 
that  time. 

We  will  use  an  inductive  heuristic,  such  as  the  In¬ 
ductive  Dichotomizer  Algorithm  ID3  [9],  to  deter¬ 
mine  the  single  object  at  which  a  user  is  looking  or 
pointing  out  of  the  many  possible.  If  the  heuristic 
picks  the  object  the  user  really  wants,  it  is  consid¬ 
ered  a  successful  determination.  The  user  will  have 
the  option  of  rejecting  the  selection  and  choosing  an¬ 
other  object,  creating  a  feedback  mechanism.  Thus, 
the  heuristic  will  analyze  data  about  objects  in  the 
set  of  possible  selections,  such  as  their  task  relevance 
values  and  whether  or  not  they  are  occluded,  as  well 
as  its  own  previous  successes,  to  pick  the  selected  ob¬ 
ject. 


So  far  we  have  addressed  how  AI  can  help  the  user 
pick  objects  in  the  environment  in  a  natural  way. 
Now  that  we  can  select  items,  how  do  we  decide  what 
to  do  with  them  without  resorting  to  cumbersome  or 
non-intuitive  input  devices?  For  this  purpose  we  will 
look  at  multi-modal  inputs. 

4.1  Multi-modal  Inputs 

We  have  determined  a  way  to  allow  the  user  to  select 
one  or  more  objects  on  which  to  perform  an  opera¬ 
tion.  However,  how  will  the  user  specify  the  oper¬ 
ation  to  be  performed?  One  way  is  to  superimpose 
a  traditional  2D  WIMP  interface  on  the  display  [3]. 
For  example,  a  context  menu  may  appear  next  to  the 
projection  of  the  selected  object  allowing  the  user  to 
pick  an  operation.  However,  in  a  wearable  outdoor 
system,  the  user  will  not  have  a  mouse  and  keyboard 
in  which  to  interact  with  the  WIMP  interface,  so  we 
will  look  to  more  natural  interfaces. 

Gesturing  and  speaking  are  natural  human  inter¬ 
action  techniques.  Multi-modal  interfaces  involving 
gesturing  and  speaking  have  shown  promising  re¬ 
sults  [8].  Additionally,  there  are  mature  systems  us¬ 
ing  these  techniques  to  determine  the  intent  of  a  user, 
such  as  QuickSet  [2]  which  utilizes  a  set  of  agents 
which  communicate  through  a  centralized  blackboard 
or  facilitator. 

As  with  object  selection,  we  propose  that  a  heuris¬ 
tic  can  be  developed  to  use  results  of  multi-modal 
integration  algorithms  along  with  user  feedback  to 
create  a  highly  usable  interface  for  performing  ac¬ 
tions  on  selected  objects. 

5  Conclusions 

Mobile  augmented  reality  has  the  potential  to  be  an 
extremely  powerful  paradigm  for  presenting  informa¬ 
tion  to  a  mobile  user.  However,  the  problems  which 
are  faced  in  developing  an  effective  user  interface  are 
a  superset  of  those  faced  with  mobile  computers  with 
2D  textual  displays.  We  have  discussed  two  problems 
—  the  control  of  what  information  will  be  displayed 
and  the  way  in  which  a  user  can  interact  with  the 
system.  In  both  cases,  we  believe  that  many  AI  prin- 
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ciples  will  greatly  contribute  towards  the  design  of 
these  interfaces.  In  a  related  paper,  our  colleagues  at 
Columbia  University  discuss  how  limitations  in  the 
resolution  of  a  tracking  system  provide  even  more 
constraints  on  the  design  and  operation  of  a  user  in¬ 
terface  [5]. 
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